feat: native mouse and keyboard control tools#515
feat: native mouse and keyboard control tools#515senamakel merged 3 commits intotinyhumansai:mainfrom
Conversation
- Introduced `ComputerControlConfig` to manage mouse and keyboard tool activation. - Implemented `KeyboardTool` and `MouseTool` for native input control using platform-native APIs. - Updated configuration schema to include computer control settings. - Enhanced tool registration logic to conditionally include mouse and keyboard tools based on user configuration. - Added comprehensive documentation and tests for new functionalities, ensuring robust integration and usability.
📝 WalkthroughWalkthroughAdds native mouse and keyboard tools, a new Changes
Sequence DiagramsequenceDiagram
participant Client as Client Request
participant Registry as Tool Registry
participant Tool as Mouse/Keyboard Tool
participant Security as SecurityPolicy
participant Enigo as Enigo (Native)
participant Result as ToolResult
Client->>Registry: request tool execution with params
Registry->>Tool: execute(args)
Tool->>Security: can_act()?
alt denied
Security-->>Tool: false
Tool-->>Result: error (permission denied)
else allowed
Security-->>Tool: true
Tool->>Security: record_action()
Tool->>Tool: parse & validate parameters
alt invalid
Tool-->>Result: error (invalid input)
else valid
Tool->>Enigo: spawn_blocking { perform native action }
Enigo-->>Tool: operation complete / errors
Tool-->>Result: success or error
end
end
Result-->>Client: ToolResult
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
src/openhuman/tools/ops.rs (1)
157-162: Please lock down the new gate with a small registry test.This branch is the only thing keeping dangerous native-input tools default-off, but this file doesn’t yet assert either the disabled or enabled case.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tools/ops.rs` around lines 157 - 162, Add a small unit/regression test that asserts the native-input gate behavior around root_config.computer_control: verify that when computer_control.enabled is false the resulting tools registry/vector does NOT contain MouseTool or KeyboardTool, and when enabled is true the registry DOES contain MouseTool and KeyboardTool; locate the code that constructs the tools vector (the block that checks root_config.computer_control.enabled and pushes MouseTool::new and KeyboardTool::new) and write tests that toggle that config, call the tool-construction function, and assert presence/absence of those specific types by inspecting the tools vector or registry.src/openhuman/tools/impl/computer/keyboard.rs (1)
145-150: Add debug/trace coverage for blocked and failure paths.Right now only the request/success path is observable. When
can_act/record_actionblocks, validation rejects, or enigo fails, there’s no structured trace explaining why the tool did nothing.As per coding guidelines,
src/**/*.rs: Add substantial, development-oriented logs on new/changed flows; include logs at entry/exit points, branch decisions, external calls, retries/timeouts, state transitions, and error handling paths.Also applies to: 157-161, 182-196, 210-224, 254-283
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tools/impl/computer/keyboard.rs` around lines 145 - 150, Add development-oriented trace/debug logs for the blocked/failure branches in keyboard handling: when self.security.can_act() returns false and when self.security.record_action() returns false, log a structured message (including agent id/session/state if available), the reason ("autonomy read-only" or "rate limit exceeded") and early-exit via ToolResult::error; similarly add trace/debug entries around validation failures, enigo calls, and their error branches (the call sites referenced around lines 157-161, 182-196, 210-224, 254-283) so each entry/exit, branch decision, external enigo invocation, and error path emits context-rich logs. Use the existing logging/tracing facility used in the crate (e.g., trace!/debug!/warn! with key fields) and ensure logs include unique identifiers so blocked actions are observable in traces.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/tools/impl/computer/keyboard.rs`:
- Around line 254-271: The current spawn_blocking block using Enigo::new and
iterating over keys stops on the first Press/Release error and can leave
modifier keys held; change it to track which keys were successfully pressed
(e.g., a Vec pressed_keys) as you iterate in the Press loop, and if a Press
fails immediately jump to a cleanup section that attempts to Release all
pressed_keys in reverse order (ignoring or logging any Release errors so cleanup
is best-effort). Also wrap the normal end-of-block Release loop in the same
best-effort pattern (release each key and log but do not return early on Release
errors). Use the existing symbols Enigo, keys, HOTKEY_INTER_KEY_DELAY and the
spawn_blocking closure to locate where to implement the pressed_keys tracking
and the guaranteed reverse-order release cleanup.
- Around line 94-97: The hotkey handler currently drops non-string entries and
doesn't enforce modifier-first plus one final non-modifier pattern; update the
hotkey validation to (1) reject any non-string entries up-front, (2) map strings
to Key values and fail if any mapping fails, (3) use is_modifier(&Key) to
require at least one modifier and exactly one trailing non-modifier with all
modifiers preceding it (e.g., modifiers... + final non-modifier), and (4) return
an error/false without sending any key events for malformed arrays. Locate the
hotkey parsing/firing logic near the code that currently filters or iterates
over the array (the block around where strings are converted to Key and keys are
fired) and add this validation before any calls that press/release keys so
malformed inputs like ["Ctrl", 1], ["Ctrl"], or ["a", "Ctrl"] are rejected.
In `@src/openhuman/tools/impl/computer/mouse.rs`:
- Around line 225-239: The drag closure in tokio::task::spawn_blocking calls
enigo.button(Direction::Press) then may return early on subsequent errors
leaving the button pressed; refactor so the press is followed by a best-effort
cleanup that always calls enigo.button(button, Direction::Release) on error or
panic. Concretely: after Enigo::new and the initial move_mouse and
button(...Press) call, perform the remaining operations inside a fallible block
(e.g., let result = (|| -> Result<_, _> { ... })();), and if result.is_err() or
a panic is caught, call enigo.button(button, Direction::Release) (ignoring its
errors) before propagating the original error; alternatively implement a small
RAII/guard type tied to the enigo instance that calls Release in Drop unless
explicitly disarmed. Ensure you reference the existing enigo variable, the
button press/release calls, and the move_mouse invocations when applying the
fix.
- Around line 28-34: parse_button currently returns Button and maps any unknown
string to Button::Left; change its signature to return Result<Button,
anyhow::Error> (or your crate's common error type) and return Err (e.g.,
anyhow::bail!("invalid mouse button: {}", value)) for unknown or non-string
"button" values instead of defaulting to Left, and update all call sites to use
parse_button(&args)? so the error propagates (references: parse_button and the
Button enum).
- Around line 253-255: The "scroll" match arm currently narrows scroll_x and
scroll_y using `as i32`, which can silently wrap out-of-range i64 values; update
the handling for `scroll_x` and `scroll_y` to validate the i64 range before
conversion (e.g., use i32::try_from(...) and handle the Err case or clamp values
to i32::MIN..=i32::MAX) and return or propagate an error when inputs are
invalid; locate the variables `scroll_x` and `scroll_y` in the "scroll" arm of
the match in mouse.rs and replace the `as i32` narrowing with a range-safe
conversion and proper error handling.
---
Nitpick comments:
In `@src/openhuman/tools/impl/computer/keyboard.rs`:
- Around line 145-150: Add development-oriented trace/debug logs for the
blocked/failure branches in keyboard handling: when self.security.can_act()
returns false and when self.security.record_action() returns false, log a
structured message (including agent id/session/state if available), the reason
("autonomy read-only" or "rate limit exceeded") and early-exit via
ToolResult::error; similarly add trace/debug entries around validation failures,
enigo calls, and their error branches (the call sites referenced around lines
157-161, 182-196, 210-224, 254-283) so each entry/exit, branch decision,
external enigo invocation, and error path emits context-rich logs. Use the
existing logging/tracing facility used in the crate (e.g., trace!/debug!/warn!
with key fields) and ensure logs include unique identifiers so blocked actions
are observable in traces.
In `@src/openhuman/tools/ops.rs`:
- Around line 157-162: Add a small unit/regression test that asserts the
native-input gate behavior around root_config.computer_control: verify that when
computer_control.enabled is false the resulting tools registry/vector does NOT
contain MouseTool or KeyboardTool, and when enabled is true the registry DOES
contain MouseTool and KeyboardTool; locate the code that constructs the tools
vector (the block that checks root_config.computer_control.enabled and pushes
MouseTool::new and KeyboardTool::new) and write tests that toggle that config,
call the tool-construction function, and assert presence/absence of those
specific types by inspecting the tools vector or registry.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 21ecf1b5-f16c-4381-a4d9-2e6513ea21ea
📒 Files selected for processing (8)
src/openhuman/config/schema/mod.rssrc/openhuman/config/schema/tools.rssrc/openhuman/config/schema/types.rssrc/openhuman/tools/impl/computer/keyboard.rssrc/openhuman/tools/impl/computer/mod.rssrc/openhuman/tools/impl/computer/mouse.rssrc/openhuman/tools/impl/mod.rssrc/openhuman/tools/ops.rs
… gate tests - keyboard hotkey: track pressed keys and always release in reverse on error (prevents stuck modifiers); validate modifier-first pattern (reject ["a","Ctrl"], ["Ctrl"], ["Ctrl","Shift"]) - mouse drag: guarantee button release via best-effort cleanup after press, even when move_mouse fails - mouse parse_button: return Result, reject unknown/non-string values instead of silently defaulting to left - mouse scroll: use i32::try_from instead of silent `as i32` truncation - Add debug logs on security block branches (can_act / record_action) - Add ops.rs regression tests for computer_control enabled/disabled gate
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/openhuman/tools/impl/computer/mouse.rs (1)
155-161: Usedebug/tracefor routine action-success logs.These success-path diagnostics are currently
info!; switching todebug!aligns better with repo logging policy for development-oriented traces.Suggested log-level adjustment
- info!( + debug!( ... - info!( + debug!( ... - info!( + debug!( ... - info!( + debug!( ... - info!( + debug!(As per coding guidelines: “In Rust, use
log/tracingatdebugortracelevel for development-oriented diagnostics in Rust.”Also applies to: 179-183, 206-210, 265-270, 316-322
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/openhuman/tools/impl/computer/mouse.rs` around lines 155 - 161, Change the success-path tracing calls from info! to debug! for the mouse tool so routine, development-oriented diagnostics are emitted at debug level; specifically replace the info! invocation that logs tool="mouse", action="move", x/x, y/y and the other similar info! blocks (the ones around the reported ranges 179-183, 206-210, 265-270, 316-322) with debug! while keeping the same structured fields and message text, and ensure the tracing/log crate is imported or available for debug! macros where these changes are made.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/openhuman/tools/impl/computer/mouse.rs`:
- Around line 91-110: Update the JSON schema descriptions for the "x" and "y"
properties to include that they are required for the "drag" action as well as
move/click/double_click; locate the schema entries for "x" and "y" (keys "x",
"y" in the mouse action schema) and change their description text to mention
"Required for move, click, double_click, and drag" so runtime expectations for
the "drag" action (which uses x/y as end coordinates) match the schema.
---
Nitpick comments:
In `@src/openhuman/tools/impl/computer/mouse.rs`:
- Around line 155-161: Change the success-path tracing calls from info! to
debug! for the mouse tool so routine, development-oriented diagnostics are
emitted at debug level; specifically replace the info! invocation that logs
tool="mouse", action="move", x/x, y/y and the other similar info! blocks (the
ones around the reported ranges 179-183, 206-210, 265-270, 316-322) with debug!
while keeping the same structured fields and message text, and ensure the
tracing/log crate is imported or available for debug! macros where these changes
are made.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1fe9cc88-e5a3-4f2f-aa40-d99496bd1467
📒 Files selected for processing (3)
src/openhuman/tools/impl/computer/keyboard.rssrc/openhuman/tools/impl/computer/mouse.rssrc/openhuman/tools/ops.rs
✅ Files skipped from review due to trivial changes (1)
- src/openhuman/tools/impl/computer/keyboard.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/openhuman/tools/ops.rs
| "x": { | ||
| "type": "integer", | ||
| "description": "Target X coordinate (absolute screen pixels). Required for move, click, double_click." | ||
| }, | ||
| "y": { | ||
| "type": "integer", | ||
| "description": "Target Y coordinate (absolute screen pixels). Required for move, click, double_click." | ||
| }, | ||
| "button": { | ||
| "type": "string", | ||
| "enum": ["left", "right", "middle"], | ||
| "description": "Mouse button for click/double_click/drag. Default: left." | ||
| }, | ||
| "start_x": { | ||
| "type": "integer", | ||
| "description": "Drag start X coordinate (absolute). Required for drag." | ||
| }, | ||
| "start_y": { | ||
| "type": "integer", | ||
| "description": "Drag start Y coordinate (absolute). Required for drag." |
There was a problem hiding this comment.
Schema text under-documents drag end coordinates.
At runtime, drag requires end coordinates via x/y (Line 229), but the schema descriptions at Line 93 and Line 97 say those are only for move/click/double_click. This mismatch can cause avoidable tool-call errors.
Suggested schema text fix
- "description": "Target X coordinate (absolute screen pixels). Required for move, click, double_click."
+ "description": "Target X coordinate (absolute screen pixels). Required for move, click, double_click, and drag (end position)."
- "description": "Target Y coordinate (absolute screen pixels). Required for move, click, double_click."
+ "description": "Target Y coordinate (absolute screen pixels). Required for move, click, double_click, and drag (end position)."Also applies to: 229-229
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/openhuman/tools/impl/computer/mouse.rs` around lines 91 - 110, Update the
JSON schema descriptions for the "x" and "y" properties to include that they are
required for the "drag" action as well as move/click/double_click; locate the
schema entries for "x" and "y" (keys "x", "y" in the mouse action schema) and
change their description text to mention "Required for move, click,
double_click, and drag" so runtime expectations for the "drag" action (which
uses x/y as end coordinates) match the schema.
Summary
mouseandkeyboard) that useenigofor platform-native input simulation (Core Graphics on macOS, SendInput on Windows, X11/libxdo on Linux)mousetool: move, click, double_click, drag, scroll actions with absolute screen coordinateskeyboardtool: type (text string), press (single key), hotkey (key combinations like Ctrl+C) with comprehensive key name parserPermissionLevel::Dangerousand gated behind[computer_control] enabled = truein config (disabled by default)Changes
src/openhuman/tools/impl/computer/mouse.rsMouseTool— 5 actions, coordinate validation, button selectionsrc/openhuman/tools/impl/computer/keyboard.rsKeyboardTool— 3 actions, key name parser (modifiers, nav, arrows, F-keys, single chars), hotkey sequencingsrc/openhuman/tools/impl/computer/mod.rssrc/openhuman/tools/impl/mod.rscomputermodulesrc/openhuman/tools/ops.rscomputer_control.enabledsrc/openhuman/config/schema/tools.rsComputerControlConfigstructsrc/openhuman/config/schema/types.rscomputer_controltoConfigsrc/openhuman/config/schema/mod.rsComputerControlConfigTest plan
cargo checkpasses cleancargo fmtapplied[computer_control] enabled = true, invoke mouse/keyboard tools via agentSummary by CodeRabbit
computer_controlconfig toggle to enable/disable these input-control tools.